library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0 ✔ purrr 0.2.5
## ✔ tibble 2.0.1 ✔ dplyr 0.7.8
## ✔ tidyr 0.8.2 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.3.0
## ── Conflicts ─────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(ggplot2)
library(leaflet)
In this blog post, we are going to compare graphing methods in R and Tableau. (Mapping, histogram and time series)
For the dataset, we are going to use the Boston crime data (2016 - 2018) from Kaggle. https://www.kaggle.com/ankkur13/boston-crime-data
The data consists 2,60,760 rows and 17 columns. Each row represents individual incident. Here are types of features in columns:
INCIDENT_NUMBER, OFFENSE_CODE, OFFENSE_CODE_GROUP, OFFENSE_DESCRIPTION, DISTRICT, REPORTING_AREA, SHOOTING, OCCURRED_ON_DATE, YEAR, MONTH, DAY_OF_WEEK, HOUR, UCR_PART, STREET, LATITUDE, LONGITUDE, LOCATION.
Generally, for beginners, Tableau is much easier to start with than R because Tableau has a much lower learning curve. Here is a learning curve comparison of different statistics softwares:
Learning Curve
For R, you need to know the basics of data structure and how to code. For example, you need to know how to handle matrix, dataframe, list, etc differently. However, in order to use Tableau, you don’t need to know how to code. If you work around for couple hours, you will be able to understand how to use it. Basically, you need to try dragging and clicking different features how Tableau work.
Initial Page of Tableau
This is an initial page after you connect the data to Tableau. You can simply drag variables into rows and columns box and Tableau will make the graph automatically.
Here is a initial page for R:
Initial Page of R
R has more options to customize because we are basically coding to make the plot. The color scheme, margin, plot size, and everything. On the other hand, Tableau gives less customization options, but it works really well in changing minor details because it is quick.
There are graphical examples from both tools, R and Tableau and we are going to answer some questions worth further considerations. We will start by comparing the mapping methods.
First, here are two plots of map describing crime frequencies by Districts in Boston using R and Tableau.
crime <- read_csv("crime.csv")
## Parsed with column specification:
## cols(
## INCIDENT_NUMBER = col_character(),
## OFFENSE_CODE = col_character(),
## OFFENSE_CODE_GROUP = col_character(),
## OFFENSE_DESCRIPTION = col_character(),
## DISTRICT = col_character(),
## REPORTING_AREA = col_double(),
## SHOOTING = col_logical(),
## OCCURRED_ON_DATE = col_datetime(format = ""),
## YEAR = col_double(),
## MONTH = col_double(),
## DAY_OF_WEEK = col_character(),
## HOUR = col_double(),
## UCR_PART = col_character(),
## STREET = col_character(),
## Lat = col_double(),
## Long = col_double(),
## Location = col_character()
## )
## Warning: 1055 parsing failures.
## row col expected actual file
## 1053 SHOOTING 1/0/T/F/TRUE/FALSE Y 'crime.csv'
## 1054 SHOOTING 1/0/T/F/TRUE/FALSE Y 'crime.csv'
## 1075 SHOOTING 1/0/T/F/TRUE/FALSE Y 'crime.csv'
## 1908 SHOOTING 1/0/T/F/TRUE/FALSE Y 'crime.csv'
## 1909 SHOOTING 1/0/T/F/TRUE/FALSE Y 'crime.csv'
## .... ........ .................. ...... ...........
## See problems(...) for more details.
crime <- crime %>%
filter(Lat != "" & Long !="") %>%
filter(Lat != -1 & Long != -1)
n <- length(levels(as.factor(crime$DISTRICT)))
par <- colorFactor(topo.colors(n), domain = crime$DISTRICT)
leaflet(crime) %>%
addTiles() %>%
addProviderTiles("CartoDB.Positron") %>%
addCircleMarkers(~Long, ~Lat,
radius = 1,
fillColor = ~par(DISTRICT),
stroke = FALSE, fillOpacity = 0.5
)